H-TD<sup>2</sup>: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch
نویسندگان
چکیده
We present H-TD 2 : Hybrid Temporal Difference Learning for Taxi Dispatch, a model-free, adaptive decision-making algorithm to coordinate large fleet of automated taxis in dynamic urban environment minimize expected customer waiting times. Our scalable exploits the natural transportation network company topology by switching between two behaviors: distributed temporal-difference learning computed locally at each taxi and infrequent centralized Bellman updates dispatch center. derive regret bound design trigger condition behaviors explicitly control trade-off computational complexity individual policy’s bounded sub-optimality; this advances state art enabling operation with bounded-suboptimality. Additionally, unlike recent reinforcement methods, policy estimation is robust out-of-training domain events. This result enabled two-step modelling approach: learned on an agent-agnostic, cell-based Markov Decision Process are coordinated using game-theoretic task assignment. validate our against receding horizon baseline Gridworld simulated dataset, where proposed solution decreases average time 50% over wide range parameters. also Chicago city real requests from public dataset 26% irregular distributions during 2016 Major League Baseball World Series game.
منابع مشابه
Spatio-temporal Efficiency in a Taxi Dispatch System
In this paper, we present an empirical analysis of the GPS-enabled taxi dispatch system used by the world’s second largest land transportation company. We first summarize the collective dynamics of the more than 6,000 taxicabs in this fleet. Next, we propose a simple method for evaluating the efficiency of the system over a given period of time and geographic zone. Our method yields valuable in...
متن کاملAdaptive Lambda Least-Squares Temporal Difference Learning
Temporal Difference learning or TD(λ) is a fundamental algorithm in the field of reinforcement learning. However, setting TD’s λ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the λ selection problem as a bias-variance trade-off where the solution is the value of λ that leads to the smallest Mean Squared Value Error (MSVE). To solve...
متن کاملAdaptive Step-Size for Online Temporal Difference Learning
The step-size, often denoted as α, is a key parameter for most incremental learning algorithms. Its importance is especially pronounced when performing online temporal difference (TD) learning with function approximation. Several methods have been developed to adapt the step-size online. These range from straightforward back-off strategies to adaptive algorithms based on gradient descent. We de...
متن کاملDual Temporal Difference Learning
Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of tempor...
متن کاملPreconditioned Temporal Difference Learning
LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Intelligent Transportation Systems
سال: 2022
ISSN: ['1558-0016', '1524-9050']
DOI: https://doi.org/10.1109/tits.2021.3097297